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Abstract 

In this paper, we present an approach for fault-tolerant synthesis by combining predefined patterns for fault-tolerance 
with algorithmic game solving. A non-fault-tolerant system, together with the relevant fault hypothesis and fault-tolerant 
mechanism templates in a pool are translated into a distributed game, and we perform an incomplete search of strategies to 
cope with undecidability. The result of the game is translated back to executable code concretizing fault-tolerant mechanisms 
using constraint solving. The overall approach is implemented to a prototype tool chain and is illustrated using examples. 

I. Introduction 

In this paper, we investigate methods to perform automatic fault-tolerant (FT for short) synthesis under the context 
of embedded systems, where our goal is to generate executable code which can be deployed on dedicated hardware 
^> platforms. 

Creating such a tool supporting the fully-automated process is very challenging as the inherent complexity is high: 
bringing FT synthesis from theory to practice means solving a problem consisting of (a) interleaving semantics, (b) timing, 
(c) fault-tolerance, (d) dedicated features of concrete hardware, and optionally, (e) the code generation framework. To 
generate tamable results, we first constrain our problem space to some simple yet reasonable scenarios (sec. Iffl. Based 
i— i on these scenarios we can start system modeling (sec. [TTl] > taking into account all above mentioned aspects. 

To proceed further, we find it important to observe the approach nowadays to understand the need: for engineers 
working on ensuring fault-tolerance of a system, once the corresponding fault model is decided, a common approach 
^ is to select some fault-tolerant patterns [14| (e.g., fragments of executable code) from a pattern pool. Then engineers 
O must fine-tune these mechanisms, or fill in unspecified information in the patterns to make them work as expected. 
With the above scenario in mind, apart from generating complete FT mechanisms from specification, our synthesis 
i technique emphasizes automatic selection of predefined FT patterns and automatic tuning such that details (e.g., timing) 
can be filled without human intervention. This also reduces a potential problem where unwanted FT mechanisms are 
synthesized due to under-specification. Following the statement, we translate the system model, the fault hypothesis, and 
the set of available FT patterns into a distributed game lfT8ll (sec. fV) , and a strategy generated by the game solver can 
be interpreted as a selection of FT patterns together with guidelines of tuning. 

For games, it is known that solving distributed games is, in most cases, undecidable [18|. To cope with undecidability, 
we restrict ourselves to the effort of finding positional strategies (mainly for reachability games). We argue that finding 
positional strategies is still practical, as the selected FT patterns may introduce additional memory during game creation, 
i— ( Hence, a positional strategy (by pattern selection) combined with selected FT patterns generates mechanisms using 
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memory. By posing this restriction, the problem of finding a strategy of the game (for control) is NP-Complete (sec. VI i, 
and searching techniques (e.g., SAT translation or combining forward search with BDD) are thus applied to assist the 
finding of solutions. 

The final step of the automated process is to translate the result of synthesis back to concrete implementation: the 
main focus is to ensure that the newly synthesized mechanisms do not change the implementability of the original 
system (i.e., the new system is schedulable). Based on our modeling framework, this problem can be translated to a 
linear constraint system, which can be solved efficiently by existing tools. 

To evaluate our methods, we have created our prototype software, which utilizes the model-based approach to 
facilitate the design, synthesis, and code generation for fault-tolerant embedded systems. We demonstrate two small 



yet representative examples with our tool for a proof-of-concept (sec. VIII i; these examples indicate the applicability of 
the approach. Lastly, we conclude this paper with an overview of related work (sec. IX i and a brief summary including 
the flow of our approach (sec. |X}. 

II. Motivating Scenario 

A. Adding FT Mechanisms to Resist Message Loss 

We give a motivating scenario in embedded systems to facilitate our mathematical definitions. The simple system 
described in Figure [T] contains two processes A, B and one bidirectional network Af. Processes A and B start executing 
sequential actions together with a looping period of lOOrns. In each period, A first reads an input using a sensor to 
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Process A 
Period = 100ms 



m e {T, F} 



InputRead(m); 
MsgSend(m)[0ms, 40ms); 
PrintOut(m); 



Network Af 



Process B 
Period = 100ms 



m e {T,F},m v G {T,l_} 



RecvMsg(m)[60ms, 100ms); 
Printout (m); 
m v ■— _L; 



Figure 1 . An example for two processes communicating over an unreliable network. 



variable m, followed by sending the result to the network TV using the action MsgSend (m) , and outputing the value 
(e.g., to a log). 

In process A, for the action MsgSend (m) , a message containing value of m is forwarded to Af, and Af broadcasts 
the value to all other processes which contain a variable named m, and set the variable m v in B as T (indicating that 
the content is valid). However, A is unaware whether the message has been sent successfully: the network component 
Af is unreliable, which has a faulty behavior of message loss. The fault type and the frequency of the faulty behavior are 
specified in the fault model: in this example for every complete period (100ms), at most one message loss can occur. 

In B, its first action RecvMsg (m) has a property describing an interval [60, 100), which specifies the release time and 
deadline of this action to be 60ms and 100ms, respectively. By posing the release time and the deadline, in this example, 
B can finalize its decision whether it has received the message m successfully using the equality constraint (m v = _L), 
provided that the time interval [40,60) between (a) deadline of MsgSend (m) and (b) release time of RecvMsg (m) 
overestimates the worst case transmission time for a message to travel from A to B. After RecvMsg (m) , it outputs 
the received value (e.g., to an actuator). 

Due to the unreliable network, it is easy to observe that two output values may not be the same. Thus the fault-tolerant 
synthesis problem in this example is to perform suitable modification on A and B, such that two output values from A 
and B are the same at the end of the period, regardless of the disturbance from the network. 

B. Solving Fault-Tolerant Synthesis by Instrumenting Primitives 

To perform FT synthesis in the example above, our method is to introduce several slots (the size of slots are fixed by 
the designer) between actions originally specified in the system. For each slot, an atomic operation can be instrumented, 
and these actions are among the pool of predefined fault-tolerant primitives, consisting of message sending, message 
receiving, local variable modifications, or null-ops. Under this setting we have created a game, as the original 
transitions in the fault-intolerant system combined with all FT primitives available constitute the controller (player-0) 
moves, and the triggering of faults and the networking can be modeled as environment (player- 1) moves. 

III. System Modeling 

A. Platform Independent System Execution Model 

We first define the execution model where timing information is included; it is used for specifying embedded systems 
and is linked to our code-generation framework. In the definition, for ease of understanding we also give each term 
intuitive explanations. 

Definition 1: Define the syntax of the Platform-Independent System Execution Model (PISEM) be S — (A,Af, T). 

• T G Q is the replication period of the system. 

• A = Ui=i n A -A-i is the set of processes, where in Ai — (Vi U V enVi , al), 

• Vi is the set of variables, and V enVi is the set of environment variables. For simplicity assume that Vi and V enVi 
are of integer domain. 

• Oi := ui[ai, Pi); . . . ;crj[aj,Pj); . . . ; \ctki , Pk t ) is a sequence of actions. 

- <jj := send(pre, index, n, s, d, v, c) \ a <— e \ receive(pre, c) is an atomic action (action pattern), where 

* a, c e Vi, 

* e is function from V enVlc U Vi to Vi (this includes null-op), 

* pre is a conjunction of over equalities/inequalities of variables, 

* s, d e {1, . . . , nji} represents the source and destination, 

* v e Vd is the variable which is expected to be updated in process d, 

* n € {1, . . . , un} is the network used for sending, and 

* index e {1, . . . , size n } is the index of the message used in the network. 

- [ctj,Pj) is the execution interval, where aj G Q is the release time and fij e Q is the deadline. 

• Af = Ui=i...njv = (7*' size i) i s tne set °f network. 



• % : N — > Q is a function which maps the index (or priority) of a message to the worst case message 
transmission time (WCMTT). 

• sizei is the number of messages used in 

[Example] Based on the above definitions, the system under execution in sectio n |II-A| can be easily modeled by PISEM: 
let A, B, and Af in section II-A be renamed in a PISEM as A%, A2, and Afi- For simplicity, we use A.j to represent 
the variable j in process A, assume that the network transmission time is 0, and let v env contain only one variable v in 
A\. Then in the modeled PISEM, we have M\ = (/ : N — > 0, 1), T = 100, and the action sequence of process A\ is 

m <— InputRead(«)[0,40); send(true, 1, 1, 1, 2, m,Ai.m)[0, 40); v <s— PrintOut(ra)[40, 100); 

For convenience, we use \Wi\ to represent the length of the action sequence a\, a j. deadline to represent the deadline of 
<7j, and iSet(a\) to represent a set containing (a) the set of subscript numbers in a~l and (b) |a7| + l, i.e., {1, . . . , ki, ki + l}. 

Definition 2: The configuration of S is (f\ i= i„. nA (vi,v enVi , A nexti ), /\ j=u nN (occUj,Sj,dj,varj,Cj,tj,indj),t), 
where 

• Vi is the set of the current values for the variable set Vi, 

• Venvi is the set of the current values for the variable set V enVi , 

• A nexti G [1, |<Ti| + 1] is the next atomic action index taken in 

• occuj g {false, true} is for indicating whether the network is busy, 
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v dj e {1, . . . ,n A }, 



• Cj 6 Z is the content of the message, 

• indj € {1, . . . , sizej} is the index of the message occupied in the network, 

• tj is the reading of the clock used to estimate the time required for transmission, 

• t is the current reading of the global clock. 
The change of configuration is caused by the following operations. 

1) (Execute local action) For machine i, let s and j be the current configuration for var and A nexti , and Vi, v e 
are current values of Vi and V enVi . If j = \ai\ + 1 then do nothing (all actions in o\ have been executed in 
this cycle); else the action crj :— var «— e [atj , (3 ■) updates var from s to e(vi,v enVi ), and changes A next . to 
min{x\x g iSet{afj,x > j}. This action should be executed between the time interval t G [ctj, (3/). 

2) (Send to network) For machine i, let s and j be the current configuration for var and A nexti . If j = \d\ \ + 1 then 
do nothing; else the action aj := send(pre, index, n, s, d, v, c)[ctj, f3j) should be processed between the time 
interval t g [ay, j3j), and changes A nexti to min{x\x g iSet(ai),x > j}. 

• When pre is evaluated to true (it can be viewed as an if statement), it then checks the condition occu n = false: 

if the condition holds, it updates network n with value (occu n , s n , d n , var n , c n , t n , ind n ) := (true, i, d, v, c, 0, index). 
Otherwise it blocks until the condition holds. 

• When pre is evaluated to false, it skips the sending. 

3) (Process message) For network j, for configuration (occu j, s j, dj, var, Cj,tj, indj) if occuj — true, then during 
tj < Tj(indj), a transmission occurs, which updates occuj to false, Ad y var to Cj, and Ad r var v to true. 

4) (Receive) For machine i, let s and j be the current configuration for c and A next .. If j = \al\ + 1 then do nothing; 
else for receive(pre, c)[ctj, f3j) in machine i, it is processed between the time interval t e [ctj,/3j) and changes 
A nexU to min{x\x g iSet(at),x > 

5) (Repeat Cycle) When t = T, t is reset to 0, and for all x g {1, ... , n A }, A„ extx are reset to 1. 

Notice that by using this model to represent the embedded system under analysis, we make the following assumptions: 

• All processes and networks in S share a globally synchronized clock. Note that this assumption can be fulfilled in 
many hardware platforms, e.g., components implementing the IEEE 1588 [11] protocol. 

• For all actions a, a. deadline < T; for all send actions a := send(pre, index, n,s,d,v,c), a.deadline + 
Tniindex) < T, i.e., all processes and networks should finish its work within one complete cycle. 

B. Interleaving Model (IM) 

Next, we establish the idea of interleaving model (IM) which is used to offer an intermediate representation to bridge 
PISEM and game solving, such that (a) it captures the execution semantics of PISEM without explicit statements of 
timing, and (b) by using this model it is easier to connect to the standard representation of games. 
Definition 3: Define the syntax of the Interleaving Model (IM) be Sim = (A,N). 
• A = |Ji=i n A -^-i is the set of processes, where in Ai = (Vi U V enVi , of), 
• V, is the set of variables, and V enVi is the set of environment variables. 

'Here an interval [1, \TjI\ + 1] is used for the introduction of FT mechanisms described later. 

2 In our formulation, the receive(pre, c) action can be viewed as a syntactic sugar of null-op; its purpose is to facilitate the matching of 
send-receive pair with variable c. 



• o-j := a- 1 [A m=1 ... riA \pc 1<miow ,pc ltmup )}; . . . ; crj[A m ^ 1 ... nA \pc jtmiow ,pc j<mup )]; . . ■ ;a kt [A m=1 ... nA [pc kt . mioui ,pc ki 
is a fixed sequence of actions. 

- c j :— send(pre, index, n, s, d, v, c) | receive(pre, c) | a 4— e is an atomic action, where a, c, e, pre, v, n, 
s, d are defined similarly as in PISEM. 

- For (jj, Vm G {1, . . . , n^}, pc^ miovl ,pCj i7nu <E {1, . . . , |ov^| + 2} is the lower and the upper bound (PC- 
precondition interval) concerning 

1) precondition of program counter in machine k, when m ^ i. 

2) precondition of program counter for itself, when m = i. 

• N = |Ji=i n N N%, Ni — (Tj, sizet) is the set of network. 

. Ti : N -> Am=i...nA • • ■ ' I 5 ™! + {■*-'•••' I 5 ™ I + 2 i) is a f unct i on which maps the index (or priority) 
of a message to the PC -precondition interval of other processes. 

• sizei is the number of messages used in A/",. 

Definition 4: The configuration of Sim is (f\i(vi,v enVi , A nexti ), ^(occuj, Sj,dj,Cj)), where Vi, v enVi ,A nexti ,occUj, 
Sj , dj , Cj are defined similarly as in PISEM. 

The change of configurations in IM can be interpreted analogously to PISEM; we omit details here but mention three 
differences: 

1) For an action (Tj having the precondition [A m= i... nA \pcj,mi aw iP c j,m u )]> it should be executed between pcj, miow < 
A nextm < pcj,m up , for all m. 

2) For processing a message, constraints concerning the timing of transmission in PISEM are replaced by referencing 
the PC -precondition interval of other processes in IM, similar to 1. 

3) The system repeats the cycle when Vx E {1, . . . , tia}, A nextic — |<7^r| + l and Mx € {1, . . . ,njy}, occu x = false. 

IV. Games 

For the proof of complexity results, we use similar notations in lfl8l to define a distributed game. Intuitively, distributed 
games are games formulating multiple processes with no interactions among themselves but only with the environment. 

(Local) Games 

A game graph or arena is a directed graph G = (Vq tt) Vi,E) whose nodes are partitioned into two classes Vq and 
V\. We only consider the case of two players in the following and call them player and player 1 for simplicity. A 
play starting from node v is simply a maximal path tt — v vi ... in G where we assume that player i determines the 
move (vf., ffe+i) € E if v k €E Vi (i G {0, 1}). With Occ(7r) we denote the set of nodes visited by a play tt. A winning 
condition defines when a given play tt is won by player 0; if it is not won by player 0, it is won by player 1. A node v 
is won by player i if player i can always choose his moves in such a way that he wins any resulting play starting from 
v. 

Distributed Games 

We use notations by Mohalik and Walukiewicz (181 to define a distributed game. From now on we call the a game 



graph defined in sec. IV a local game graph. 

Definition 5: For all i e {1, ... , n}, let Gi = (Vo i tt) Vi t ,Ei) be a local game graph with the restriction that it is 
bipartite. Define a distributed game to be Q = (Vo W Vi, £, Acc C (Vq tt) Vi) w ): 

• Vi = V\ 1 x ... x V\ n is the set of player 1 (environment) vertices. 

. Vo = (Vq 1 ttl ViJ x ... x (Vb n tt) Vi n ) \ Vi is the set of player (control) vertices. 

- For a vertex x = (x\, . . . ,x n ), we use the function proj(x,i) to retrieve the i-th component Xi, and use 
proj(X, i) to retrieve the i-th component for a set of vertices X. 

• Let (xi, . . . , x n ), (xi, . . . , x' n ) e Vo tt) Vi, then define £ as follows: 

- If (xi, . . .,x n ) £ V Q , ((xi, . . .,x n ), (xi, . . .,x' n )) G £ if and only if 
Vi.(i< G V 0i (Xi, x'J G Si) A Vj. (xj G V^. = a^-). 

- For (rci, . . . , x n ) G Vi, if ((^i, . . . , x n ), (x' l7 . . . , xJJ) G £, then for every a^, either Xi — x[ or x- G V 0i , and 
moreover (xi,...,x n ) ^ {x\, x' n ). 

• Acc is the acceptance condition. 

In a distributed game = (Vo ttl Vi,£ , Acc), a play is defined analogously as defined in local games: a play starting 
from node vo is a maximal path tt = vqVi ... in ^ where player i determines the move (vk,v k +i) G £ if v k G V 
(*€{0,1». 

A distributed strategy of a distributed game for player is a tuple of functions £ = (/i, . . . , /„), where each function 
/j : (Vb 4 tt) VlJ* x Vo ; — > (Vo; W ViJ is a local strategy which decides the updated location of the local game i based on 
(a) its observable history of local game i and (b) current position of local game i. Lastly, we call a distributed strategy 



Algorithm 1: GeneratePreconditionPC 



Data: PISEM model S = (A, Af, T) 

Result: Two maps mapLB, mapUB which map from an action a (or a msg processing by network) to two 
integer arrays lower[l . . . n^], upper[l . . . Ua] 

begin 

/* Initial the map for recording the lower and upper bound for action */ 
for action a k in Ai of A do 

mapLB .putiak, new int[l. . . n^Kl)) /* Initialize to 1 */ 

mapU B .put((Jk, new int[l. . . UaJ) 

for Aj £ A do mapU B .get(ak)\j] '■= + 2 /* Initialize to upperbound */ 
mapLB.get(a k )[i] = k; mapUB.get(a)[i] = k+1; /* self PC */ 

for action a m in Ai of A, m = 1, . . . , \al\ do 

for action a n in Aj of A, n = 1, . . . , \oj\ , j ^ i do 

1 if <j rn .releaseTime > a n . deadline then 

[_ mapLB.get(a rn )\j] := max{mapLB.get(a m )\j], n + 1} 

2 if cr rn . deadline < a n .releaseTime then 

[_ mapUB.get(a m )[}] := nun{mapUB.get(a m )\]], n + 1}; 

/* Initialize the map for recording the lower and upper bound for msg transmission */ 
for action a k — send(pre,ind,n, s,d,v,c) in Ai of A do 

mapLB .putin.ind, new int[l... uaYX)) I* Initialize to 1 */ 

mapLB .get(n.ind)[i] := k+1 /* Strictly later than executing send() */ 

mapU B .put(n.ind, new int[l. . .tia]) 

for Aj £ A do mapU B .get(n.ind)[j] := +2 1* Initialize to upperbound */ 

for action a k — send(pre, ind, n, s, d, v, c) in Ai of A do 
for action a m in Aj of A n = 1, . . . , \aj\ do 

3 if an.releaseTime + > a m . deadline then 

[_ mapLB .get(n.ind)\j] := max{mapLB.get(n.ind)[j], m + 1} 

4 if ak-deadline + T n (ind) < a m .releaseTime then 

[_ mapUB.get(n.ind)\j] := min{mapUB.get(n.ind)\j], m + 1}; 



positional, if /; is a function mapping from Vb 4 to Vo t W V\ it i.e., the update of location depends only on the current 
position of local game. 

Definition 6: A distributed game Q = (Vo W Vi,£,Acc) is reachability- winning by a distributed strategy £ = 
(/i, ...,/ n ) over initial states Vi„j G Vo W Vi and target states V^oa/ € Vo U Vi, when the following conditions 
hold: 

. Acc = {v oVl . . . e (V W Vi)" I Occ(?; t; 1 . . .) n K, oa; ^ 0}. 

• For every play n = V0V1V2, ■ ■ ■ where vo £ Vi n i, player wins tt when the following constraints hold: 

- Vi E N . e V -> (Vj € {1, .. . {proj(vi,j) £ V 0j -> proj(v i+1 ,j) = proj(fj(vi), j)))). 

V. Step A: Front-end Translation from Models to Games 

A. Step A.l: From PISEM to IM 

To translate from PISEM to IM, the key is to generate abstractions from the release time and the deadline information 
specified in PISEM. As in our formulation, the system is equipped with a globally synchronized clock, the execution of 
actions respecting the release time and the deadline can be translated into a partial order. Algorithm [TJ concretizes this 
idea by generating PC-intervals in all machines as 

• temporal preconditions for an action to execute, or 

• temporal preconditions for a network to finish its message processing, i.e., to update a variable in the destination 
process with the value in the messag^] 

3 Here we assume that in each period, for all Mj, each message of type ind £ {1, . . . , sizej} is sent at most once. In this way, the algorithm can 
assign an unique PC-precondition interval for every message type. 
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Figure 2. An illustration for Algorithm [T] 



Starting from the initialization where no PC is constrained, the algorithm performs a restriction process using four 
if-statements {(1), (2), (3), (4)} listed. 

• In (1), if a m .releaseTime > a n . deadline, then before a m is executed, a n should have been executed. 

• In (2), if a m .deadline < a n .releaseTime, then a n should not be executed before executing a m . 

• Similar analysis is done with (3) and (4). However, we need to consider the combined effect together with the 
network transmission time: we use to represent the best case, and T n (ind) for the worst case. 

[Example] For the example in sec. [IT] consider the action a± := m <s— inputRead(u)[0, 40) in A\ of a PISEM. 
Algorithm [T] returns mapLB(a) and mapUB(a) with two arrays [1,1] and [2,2], indicated in Figure [2^. Based on 
the definition of IM, o\ should be executed with the temporal precondition that no action in Ai is executed, satisfying 
the semantics originally specified in PISEM. For the analysis of message sending time, two cases are listed in Figure |2|) 
and Figure [2]:, where the WCMTT is estimated as 15ms and 30ms, respectively. 

B. Step A.2: From IM to Distributed Game 

Here we give main concepts how a game is created after step A.l is executed. To create a distributed game from a 
given interleaving model Sim — (A, TV), we need to proceed with the following three steps: 

1) Step A.2.1: Creating non-deterministic timing choices for existing actions: During the translation from a PISEM 
S = (A,Af,T) to its corresponding IM Sim = (A-^0> f° r a U process A% in A, for every action a [a, (5) where 
a[a, f3) € Wl, algorithm 1 creates the PC-precondition interval [A m= i...„ A [pc rra , ora ,pc mup )] of other processes. Thus in the 
corresponding game, for o-[/\ m =i...n A \pc miow ,PCm ap )], each element a[A m=1 ... nA {pc m )}, where pc miow < pc m < pc mup , 
is a nondeterministic transition choice which can be selected separately by the game engine. 

2) Step A.2. 2: Introducing fault-tolerant choices as o: In our framework, fault-tolerant mechanisms are similar to 
actions, which consist of two parts: action pattern a and timing precondition [A m= i,,, riA [pc miowl pc mu )]. Compared 
to existing actions where nondeterminism comes from timing choices, for fault-tolerance transition choices include all 
combinations from (1) timing precondition and (2) action patterns available from a predefined pool. 

We use the notation (T|, where | 6 Q\N, to represent an inserted action pattern between cr^j and crro-|. With 
this formulation, multiple FT mechanisms can be inserted within two consecutive actions ct^, ai+i originally in the 
system, and the execution semantic follows what has been defined previously: as executing an action updates A nexti 
to min{x\x € iSet(lfi),x > j}, updating to a rational value is possible. Note that as a<k is only a fragment without 
temporal preconditions, we use algorithm [2] to generate all possible temporal preconditions satisfying the semantics of 
the original interleaving model: after the synthesis only temporal conditions satisfying the acceptance condition will be 
chosen. 

We conclude this step with two remarks: 

• For all existing actions, the non-deterministic choice generation in step A.2.1 must be modified to contain these 
rational points introduced by FT mechanisms. 

• A problem induced by FT synthesis is whether the system behavior changes due to the introduction of FT 
mechanisms. We answer the problem by splitting into two subproblems: 



Algorithm 2: DecidelnsertedFTTemplateTiming 



Data: <r c [A m =i... n )], o"d[A TO =i...nAb c rf,mi om iPCrf,-m„p)]. which are consecutive actions in ct, of 

of Sim — (A, N), and one newly added action pattern era to be inserted between 
Result: Temporal preconditions for action pattern a&: [h m= x... nA [pc-k , miom ,pc<k t m up )\ 
begin 

for m = 1 , . . . , tla do 
if m^i then 

pc| jTOioiii := pc Ct m low I* Use the lower bound of c for its lower bound */ 
pca im :— pcd.m up I* Use the upper bound of d for its upper bound */ 

else 

L P c f,™ io „ := f;pc f ,„ lup :=d 
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• [Problem 1] Whether the system is still schedulable due to the introduction of FT actions, as these FT actions 
also consume time. This can only be answered when the result of synthesis is generated, and we leave this to 
section IVIII 

• [Problem 2] Whether the networking behavior remains the same. This problem must be handled before game 
creation, as introducing a FT message may significantly influence the worst case message transmission time 
(WCMTT) of all existing messages, leading a completely different networking behavior. The answer of this 
problem depends on many factors, including the hardware in use, the configuration setting, and the analysis 
technique used for the estimation of WCMTT. In Appendix A we give a simple analysis for ideal CAN buses J7), 
which are used most extensively in industrial and automotive embedded systems: in the analysis, we propose 
conditions where newly added messages do not change the existing networking behavior. Similar analysis can 
be done with other timing-predictable networks, e.g., FlexRay ll20l . 

3) Step A.2.3: Game Creation by Introducing Faults: In our implementation, we do not generate the primitive form of 
distributed games (DG), as the definition of DG is too primitive to manipulate. Instead, algorithms in our implementations 
are based on our created variant called symbolic distributed games (SDG): 

Definition 7: Define a symbolic distributed game Gabs = (Vf ^ Vctr W VENv,A,N,af,pred). 

• Vf, Vctr, Venv are disjoint sets of (fault, control, environment) variables. 

• pred : Vf x Vctr X Venv — > {true, false} is the partition condition. 

• A = Ui=i n A Ai is the set of symbolic local games (processes) , where in A t = (Vi U V enVi ,ot), 

• Vi is the set of variables, and V enVi C Venv- 

• &i ■= U cr ii( A m=i,...,nAP c ii m ); ■ ■ • 5 U a i k {^m=x,...,n A PCi km ) is a sequence, where 
Vj = 1, . . • , k, |J o ij {/\ m= x,...,nAP c ij m ) i s a set °f choice actions for player-0 in Ai. 

- cTj. is defined similarly as in IM. 

- Vm = {1,... ,tia}, pci jm G [pCi j , miou ,,pc ij ,m up ),pCi j , miov] ,pc ij , mup G iSet(o\^). 

• Vctr = Ui=i... nj4 v *- 

• N = U i=1 n N ^i, Ni — (Ti, sizei,trarii) is the set of network processes. 

• Ti and sizei are defined similarly as in IM. 

• trarn : Vf x ({true, false}x{l,... ,ha} 2 x Ui=i „ A (KUKmi,)xZx{l, . . . , sizei}) -> Vf X ({true, f alse} x 

{1, . . . ,ua} 2 x |J j =1 rej (Vj U Venvi) x Z x {1, . . . , sizei}) is the network transition relation for processing 

messages (see sec. |III-A for meaning), but can be influenced by additional variables in Vf. 

• af : Vf x Vctr x Venv x /\ i=1 nA iSet(wt) — > Venv X Vf x /\ i=1 nA iSet(at) is the environment update relation. 
We establish an analogy between SDG and DG using Figure [3] 

1) The configuration v of a SDG is defined as the product of all variables used. 

2) A play for a SDG starting from state vq is a maximal path n = vqVi . . ., where 
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Figure 4. Creating the SDG from IM, FT mechanisms, and faults. 



• In Vk, player-1 determines the move (vk, Ufe+i) € E when pred(vk) is evaluated to true (false for player-0); 
the partition of vertices Vq and V\ in a SDG is implicitly defined based on this, rather than specified explicitly 
as in a distributed game. 

• A move (vk, «fe+i) is a selection of executable transitions defined in N, er/, or A; in our formulation, transitions 
in N and 07 are all environment moves^] while transitions in A are control moves^] 

3) Lastly, a distributed positional strategy for player-0 in a SDG can be defined analogously as to uniquely select 
an action from the set \Ja' aj (A m —i > ,,. tnA ,pc a . ), for all Aj and for all program counter j defined in cfi. Each 
strategy should be insensitive of contents in other symbolic local games. 

We now summarize the logical flow of game creation using Figure [4] 

• (a) Based on the fixed number of slots (for FT mechanisms) specified by the user, extend IM to IMf rac to 
contain fractional PC -values induced by the slot. 

• (b) Create IMf rac+ FT, including the sequence of choice actions (as specified in the SDG) by 

• Extracting action sequences defined in IMf rac to choices (step A. 2.1). 

• Inserting FT choices (step A. 2. 2). 

• (c) Introduce faults and partition player-0 and player-1 vertices: In engineering, a fault model specifies potential 
undesired behavior of a piece of equipment, such that engineers can predict the consequences of system behavior. 
Thus, a fault can be formulated with three tuples^] 

1) The fault type (an unique identifier, e.g., MsgLoss, SensorError). 

2) The maximum number of occurrences in each period. 

3) Additional transitions not included in the original specification of the system (fault effects). 
We perform the translation into a game using the following steps. 

• For (1), introduce variables to control the triggering of faults. 

• For (2), introduce counters to constrain the maximum number of fault occurrences in each period. 

• For (3), for each transition used in the component influenced by the fault, create a corresponding fault transition 
which is triggered by the variable and the counter; similarly create a transition with normal behavior (also 
triggered by the variable and the counter). Notice that our framework is able to model faults actuating on the 
FT mechanisms, for instance, the behavior of network loss on the newly introduced FT messages. 

[Example] We outline how a game (focusing on fault modeling) is created with the example in sec. [II} similar approaches 
can be applied for input errors or message corruption; here the modeling of input (for InputRead (m) ) is skipped. 

• Create the predicate pred: pred is evaluated to false in all cases except (a) when the boolean variable occu 
(representing the network occupance) is evaluated to true and (b) when for all i e {1, . . . , tia}, A nexti = + 1 
(end of period); the predicate partitions player-0 and player-1 vertices. 

• For all process i and program counter j, the set of choice actions (J a a . (/\ m =i,...,n A ,pc a - ) are generated based 
on the approach described previously. 

• Create variable vj 6 Vf, which is used to indicate whether the fault (MsgLoss) has been activated in this period. 

• In this example, as the maximum number of fault occurrences in each period is 1, we do not need to create 
additional counters. 

• For each message sending transition t in the network, create two normal transitions (vf = true A v'* = true) At 
and (vf — false AuJ = false) A t in the game. 

• For each message sending transition t in the network, generate a transition t' where the message is sent, but the 
value is not updated in the destination. Create a fault transition (vf = false A v'j = true) A t' in the game. 

• Define <tj to control vf. if for all i e {1, . . . , ha], A nex tj = + 1, then update vt to false as A nexti updates 
to 1 (reset the fault counter at the end of the period). 

VI. Step B: Solving Distributed Games 
We summarize the result from lfl8l as a general property of distributed games. 

4 As the definition of distributed games features multiple processes having no interactions among themselves but only with the environment, a SDG 
is also a distributed game. In the following section, our proof of results and algorithms are all based on DG. 

5 This constraint can be released such that transitions in A can either be control (normal) or environment (induced by faults) moves; here we leave 
the formulation as future work. 

6 For complete formulation of fault models, we refer readers to our earlier work (5) . 
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Theorem 1: There exists distributed games with global winning strategy but (a) without distributed memoryless 
strategies, or (b) all distributed strategies require memory. In general, for a finite distributed game, it is undecidable to 
check whether a distributed strategy exists from a given position lfl8l . 

As the problem is undecidable in general, we restrict our interest in finding a distributed positional strategy for player 

0, if there exists one. We also focus on games with reachability winning conditions. By posing the restriction, the 
problem is NP-Complete. 

Theorem 2: [PositionalDGo] Given a distributed game Q = (Vo W Vi,£), an initial state x = (x\, . . . ,x n ) and a 
target state t = (t%,... ,t n ), deciding whether there exists a positional (memoryless) distributed strategy for player-0 
from x to t is NP-Complete. 
Proof: 

We first start by recalling the definition of attractor, a term which is commonly used in the game and later applied in 
the proof. Given a game graph G = (Vo tfcl V±, E), for i £ {0, 1} and X C V, the map attTj(X) is defined by 

satii(X) :=XU{veVi\vEnX^<D}U{ve Vi_i {tt^vECX}, 

1. e., aUii(X) extends X by all those nodes from which either player i can move to X within one step or player 1 — i 
cannot prevent to move within the next step. (vE denotes the set of successors of v.) Then AttTj(Jf) := UfceN attr f(^0 
contains all nodes from which player i can force any play to visit the set X. 

We continue our argument as follows. 

[NP] The reachability problem for a distributed game can be solved in NP: a solution instance £ = (fi, ■ ■ ■ , f n ) is 
a strategy which selects exactly one edge for every control vertex in the local game. As the distributed game graph is 
known, after the selection we calculate the reachability attractor Attr ({t}) of the distributed game: during the calculation 
we overlook transitions which is not selected (in the strategy) in the local game. This means that in the distributed game, 
to add a control vertex u e Vo to the attractor using the edge (i>, u), we must ensure that Mj e {1, . . . , n}. (proj(vi,j) <E 
Vb — > proj(u,j) = proj(fj(v), j)). Lastly, we check if the initial state is contained; the whole calculation and checking 
process can be done in deterministic P-time. 

[NP-C] For completeness proof, we perform a reduction from 3SAT to the finding of positional strategies in a 
distributed game. Given a set of 3CNF clauses {C\ , . . . , C m } under the set of literals {vari , vWi, . . . , var n , var n } and 
variables {var%, . . . ,var n }, the distributed game Q is created as follows (see Figure [5] for illustration): 
• Create 3 local games G\, G2, and G3, where for Gi — (Vo i 1+1 Vi^Ei): 



• Voi ={vari,...,var n }, V u = {S,T vari ,F vari , . . . ,T var . n ,F varn }. 

• Ei = \J j =i,..., n {( var j, T vari),(var j ,F var .)}. 

• Create local game G 4 = (Vq 4 ttl Vi i ,E i ): 

. V ,={OK ,NO Q } U U, : \. 

. ^={5,0^1,^0!} U U, : „..,.{'•,!• 

• ^ = U F i U {(OK ,OKi),(NO ,NOi)}. 

• Second, create the distributed game Q from local games above, and define the set of environment transition to 
include the following types using the 3SAT problem: 

1) (Intention to check SAT) In the 3SAT problem, for clause C, = V l 2i V l 3i ), let the variable for 
literals h^h^hi be vari i ,var 2i ,vars i . Create a transition in the distributed game from (S,S,S,S) to 
(uorij , var 2t , var 3t ,v ia ). 

2) (Intention to check consistency) In the 3SAT problem, for variable var i7 Create a transition in the distributed 
game from (S,S,S,S) to (vari, var i} vari, v m+io ). 

3) (Result of clause) In the 3SAT problem, for clause Ci = V l 2i V Z3J, let the variable for the clause be 
vari i ,var 2i ,var 3i . We refer the vertex evaluating varj. as true to T- L in the local game Gf similarly, we 
use Fi for a variable being evaluated false. For each clause d, enumerate over 8 cases for the assignments 
of vari i , var 2i , var 3i which make C\ true. 

a) For cases which makes the assignment true, create an edge from the assignment to (vari, vari, vari, OK ); 
for example, if var li = true, i>ar 2i = false, var 3i = true makes a satisfying assignment to Cj, create 
an edge ((Ti 4 , F 2i ,T 3i , v^), (vari,vari, vari, OKq)). 

b) For cases which makes the assignment false, create an edge from the assignment to (var\, vari, vari, NOo)- 

4) (Result of variable consistency) For all i € {1, . . . , n}: 

a) Create two edges ((Tj, Tj, Tj, Vm+i^, (vari, vari, vari, OKo)) and 
((Fi, Fi,Fi, v m+il ), (vari,vari,vari,OK )). 

b) For other 6 combinations (7* , Fj, F i; v m+il ),(F l ,F l ,T i , v m+il ) , 

C-^i i Ei, Fi, V m +i 1 ) , (Ti ,Ti, Fi, V rn +i 1 ) , (Fi ,Ti,Ti, V m-\-ii ) : 

(Ti,Fi,Ti,v m+il ), create edges to (vari, vari, vari, NOq). 

5) (Continuous execution) For all z e {1, . . . , n}: 

a) For all combinations (Ti,Fi,Fi, OKi), (Fi, Fi, T l , OKi), (T l , F l , F l , OKi), 
(Fi,Fi,Ti,OKi),(Ti,Fi,Fi,OKi),(Ti,Ti,Fi,OKi),(Fi,Ti,Ti,OKi), 
(Ti,Fi,Ti,OKi), create edges to (vari,vari,vari,OK ). 

b) For all combinations (T t ,F,F t , NOi ),(F l ,F l ,T t ,N0 1 ),(T l ,F l ,F t , NOi ) , 
(Fi, Fi, Ti,NOi), (Ti, Fi, Fi, NOi), (T t ,Ti, Fi, NOi), (Fi, T t , T; t , NOi), 
(Ti,Fi,Ti,NOi), create edges to (vari, vari, vari, NOq). 

We claim that {Ci, . . . , C m } is satisfiable iff Q has a positional distributed strategy to reach (vari, vari, vari, OK ) 
from (S, S, S, S). 

1) If {Ci, . . . , C m } is satisfiable, let the set of satisfying literals be L', and assume that for all literals, in each pair 
(vari,vaf~i) exactly one of them is in L' (this is always possible). For the distributed game Q, in local games 
Gi, G 2 and G3, let the positional strategy for control vertex vari move to Tj if vari € L', and move to Fi if 
vafi £ L' (for G4, simply use the local edge). In a play, as player- 1 starts the move, any of his selection leads to 
a player-0 vertex: 

• If player-1 choose edges of type 1 (intension to check the clause of SAT), for G\, G 2 and G 3 , the vertex 
uses its positional strategy, which corresponds to the assignment in the clause. The combined move then forces 
player-1 to choose an edge of type 3(a), leading to the target state. 

• If player-1 choose edges of type 2 (intension to check the consistency), as the positional strategies for Gi, G 2 
and G3 are all derived from the same satisfying instance of the 3SAT problem, for each strategy, it performs 
the same move from vari to Tj or to F^; the combined move of player-0 forces player-1 to choose an edge of 
type 4(a), leading to the target state. 

2) Consider a distributed positional strategy f 2 , fy, fi) which reaches 

(vari, vari, vari, OKq) from (S, S, S, S). In Gi, for each control vertex var i7 it points to Tj or Fj. The positional 
strategy of Gi generates a satisfying instance of the 3SAT problem: 

• Assign vari m the 3SAT problem to true if the strategy points vertex vari in Gi to Tj. 

• Assign vari in the 3SAT problem to false if the strategy points vertex vari in G\ to Fi. 
We analyze the size of the game and the time required to perform the reduction. 

• For 2 = 1,2, 3, Gi contains 3n + 1 vertices, and G4 has 2(m + n + 2) + 1 vertices. As the total vertices of the 
distributed game is the product, it is polynomial to the original 3SAT problem instance. 

• Consider the time required to perform reduction from 3SAT to PositionalDGo. 



• For i = 1,2,3, Gi, they are constructed in (D(n). 

• G4 is constructed in (D(m + n). 

• For the distributed game, vertices are constructed polynomial to rn and n, more precisely 0(n 3 (m + n)). 

• For edges in the distributed game, we consider the most complicated case, i.e. creating an edge of type 3. Yet it 
takes constant time to check and establish the connection, and for each player-1 vertex except (S, S, S, S) which 
has m + n edges, at most 8 edges are created. Therefore, the total required time for edge construction is also 
polynomial to m and n. 

Therefore, 3SAT < po i y PositioalDGo, which concludes the proof. ■ 
With the NP-completeness proof, finding a distributed reachability strategy for distributed games amounts to the 
process of searching. For example, it is possible to perform a bounded-depth forward search over choices of local 
transitions: during the search, the selection of edges is constructed as a tree node in the search tree, and the set of 
reachable vertices (represented as BDD) based on the selection is also stored in the tree node. This method is currently 
implemented in our framework. 

A. Solving Distributed Games using SAT Methods 

Apart from the search method above, in this section we give an alternative approach based on a reduction to SAT. 
Madhusudan, Nam, and Alur [1J designed the bounded witness algorithm (based on unrolling) for solving reachability 
(local) games. Although based on their experiment, the witness algorihm is not as efficient as the BDD based approach 
in centralized games, we find this concept potentially useful for solving distributed games. For this, we have created a 
variation (Algorithm [3]) for this purpose. 

To provide an intuition, first we paraphrase the concept of witness defined in [1], a set of states which witnesses the 
fact that player wins. In [ 1 1, consider the generated SAT problem from a local game G = (Vo W V\, E) trying to reach 
from Vi n it to Vg 0a f. for i = 1, . . . , d and vertex v G Vq W V\, variable (v)i — true when one of the following holds: 

1) v G Vina and i = 1 (if v £ Vinit A i = 1 then (v)i = false). 

2) v € V goa i (if v V goa i M = d then (v)t = false). 

3) v G Vq \ Vg 0a i and 3v' G V ti V\. 3e G E. 3j > i. (e = (v, v') A (v')j = true) 

4) v G Vi\ Vg 0a i and Ve = (v,v') G E. 3j > i. (v')j = true 

This recursive definition implies that if v in Vq (resp. in V±) is not the goal but in the witness set, then exists one 
(resp. for all) successor v' which should either be (i) in a goal state or (ii) also in the witness: note that for (ii), the 
number of allowable steps to reach the goal is decreased by one. This definition ensures that all plays defined in the 
witness reaches the goal from the initial state within d — 1 steps: If a play (starting from initial state) has proceeded 
d— 1 steps and reached u ^ V goa h then based on (2), (it) d should be false. However, based on (1), (3), (4) the (u)^ 
should be set to true (reachable from initial states using d — 1 steps). Thus the SAT problem should be unsatisfiable. 

In general, Algorithm [3] creates constraints based on the above concept, but compared to the bounded local game 
reachability algorithm in |1|, it contains slight modifications: 

1) When a variable (v)i is evaluated to true, it means that vertex v can reach the target state within d — i steps, 
which is the same as what is defined in (Jj. However, we introduce more variables for edges in local games, which 
is shown in STEP 1: when a variable (e) is evaluated to true, the distributed strategy uses the local transition e. 

2) To achieve locality, we must include constraints specified in STEP 4: the positional (memoryless) strategy disallows 
to change the use of local edges from a given vertex. 

3) We modify the impact of control edge selection in STEP 6 by adding an additional implication "(e) =^" over the 
original constraint in the witness algorithm Q). Here as in Mohalik and Walukiwitz's formulation, all subgames 
in a control position should proceed a move (the progress of a global move is a combination of local moves), we 
need to create constraints considering all possible local edge combinations. 

In appendix B, we give an alternate algorithm working with different formulation of distributed games where in each 
control location, only one local game can move: a run of the game may execute multiple local moves until it reaches a 
state where all local games are in an environment position. We find this alternative formulation closer to the interleaving 
semantics of distributed systems. 

VII. Conversion from Strategies to Concrete Implementations 

Once when the distributed game has returned a positive result, and assume that the result is represented as an IM, the 
remaining problem is to check whether the synthesized result can be translated to PISEM and thus further to concrete 
implementation. If for each existing action or newly generated FT mechanism, the worst case execution time is known 
(with available WCET tools, e.g., Abslnj^]), then we can always answer whether the system is implementable by a 
full system rescheduling, which can be complicated. Nevertheless, based on our system modeling (assumption with a 
globally synchronized clock), perform modification on the release time or the deadline on existing actions from the 
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Algorithm 3: PositionalDistributedStrategy_BoundedSAT_0 

Data: Distributed game graph Q = (Vo W Vi,£), set of initial states Vi n it, set of target states V goa i, the unrolling 
depth d 

Result: Output: whether a distributed positional strategy exists to reach V goa i from w init 
begin 

let clauseList := getEmptyListf ) I* Store all clauses for SAT solvers */ 
/* STEP 1: Variable creation */ 
for v = (vi, . . . , v m ) G Vo W Vi do 
|_ create d boolean variables (v\, . . . , v m )\, . . . , (v\, . . . , v m )d\ 

for local control transition e = (xi, x^) G Ei, Xi G Vo i do 
|_ create boolean variable (e); 

/* STEP 2: Initial state constraints */ 
for v = (ui,... , w m ) G V W Vi do 

if (wi,..., u m ) e Vi n i t then 
j clauseList.add([(wi, . . . , v m )i]) 

else 

' clauseList.add([^(wi, . . . , v m )ij) 

/* STEP 3: Target state constraints */ 
for v = (ui, . . . , u m ) G V W Vi do 

if . . . ,v m ) G Vg 0a i then 
| clauseList.add([(t>i, . . . ,v m )i A ... A (vi, . . .,v m ) d ]) 

else 

|_ clauseList.add([-i(«i, . . . , v m ) d ]) 

/* STEP 4: Unique selection of local transitions (for distributed positional strategy) */ 
for local control transition e = x[) G Ei, Xi G Vo 4 do 

for ZocaZ transition e\ = (x i: x' ix ),..., = (xi, G £i, ei . . . e fe ^ e do 
| clauseList.add([(e) (->(ei) A ... A ^(e fe ))]) 

/* STEP 5: If a control vertex is in the attractor (winning region) but not a goal, 
an edge should be selected to reach the goal state */ 
for v = (vi, . . . ,v m ) G Vo do 
for Vi , i = 1 , . . . , m do 
if Vi G Vo t \ Vgoai then 

let (J . ej be the set of local transitions starting from Vi in Gj 
if Uj e i 7^ 4> then 
L clauseList.add([(Vi =1 ... a («)i) => (V^i)]) 

/* STEP 6: Impact of control edge selection (simultaneous progress) */ 
for v = (vi, . .. ,v m ) G Vo do 

forall the edge combination (e±, . . . , e m ): ei — (vi, v[) G Ei when Vi G Vo 4 or ei = (vi, Vi) when Xi G V\ i 

do 

/* ej = (vi,Vi) when #i G are simply dummy edges for ease of formulation */ 
for j = 1, . . . , d — 1 do 
|_ clauseList.add([(t> 1 ,...,w m ) J => ((A{i|„ ie y .}(ei)) => 

/* STEP 7: Impact of environment vertex */ 

for environment vertex v — (vi, . . . ,v m ) G Vi do 

let the set of successors be |J i Vi\ for j = 1, . . . , d — 1 do 
L clauseList.add([(w) J => (Ai(<«i>j+i V . . . V, 

/* STEP 8: Invoke the SAT solver: return true when satisfiable */ 
return invokeSATsolver(clauseList) 
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Figure 6. An example where FT primitives are introduced for synthesis. 



synthesized IM can be translated to a linear constraint system, as in the synthesized IM each action contains a timing 
precondition based on program counters. Here we give a simplified algorithm which performs local timing modification 
(LTM). Intuitively, LTM means to perform partitions on either 

1) the interval d between the deadline of action er^f J and release time of <7ra-|, if (a) a<± exists and (b) d ^ 0, or 

2) the execution interval of action cr[|j, if 0"| exists. 

In the algorithm, we assume that for every action cr^, d £ N where FT mechanisms are not introduced between 
and tJd+i during synthesis, its release-time and deadline should not change; this assumption can be checked later or 
added explicitly to the constraint system under solving (but it is not listed here for simplicity reasons). Then we solve 
a constraint system to derive the release time and deadline of all FT actions introduced. Algorithm [4] performs such 
executiorj^ for simplicity assume at most one FT action exists between two actions Ci, <Ti+i; in our implementation 
this assumption is released: 

• Item (1) performs a interval split between ct^j and crs. 

• Item (3) assigns the deadline of <T|aj to be the original deadline of <r». 

• Item (4), (5) ensure that the reserved time interval is greater than the WCET 

• Item (6) to (11) introduce constraints from other processes: 

• Item (6) (7) (8) consider existing actions which do not change the deadline and release time; for these fetch the 
timing information from PISEM. 

• Item (9) (10) (11) consider newly introduced actions or existing actions which change their deadline and release 
time; for these actions use variables to construct the constraint. 

• Item (12) is a conservative dependency constraint between o<l and a send ad- 

VIII. Implementation and Case Studies 

For implementation, we have created our prototype software as an Eclipse-plugin, called GECKcj^] which offers an 
open-platform based on the model-based approach to facilitate the design, synthesis, and code generation for fault- 
tolerant embedded systems. Currently the engine implements the search-based algorithms, and the SAT-based algorithm 
is experimented independently under GAVSpJ a tool for visualization and synthesis of games. 

To evaluate our approach, here we reuse the example in sec. [H] and perform automatic tuning synthesis for the selected 
FT mechanisms. The models specified in this section, as well as the Gecko Eclipse-plugin which generates the result, 
are available in the website. 

A. Example from Section 2 

In this example, the user selects a set of FT mechanism templates with the intention to implement a fail-then-resend 
operation, which is shown in Figure [6] The selected patterns introduce two additional messages in the system, and the 
goal is to orchestrate multiple synchronization points introduced by the FT mechanisms between A and B (the timing 
in FT mechanisms is unknown). The fault model, similar to sec. [TTJ assumes that in each period at most one message 
loss occurs. 

Once when Gecko receives the system description (including the fault model) and the reachability specification, it 
translates the system into a distributed game. In Figure [7] the set of possible control transitions are listed^]] the solver 

8 Here we list case 2 only; for case 1 similar analysis can be applied. 
l " http://www6.in.tum.de/~chengch/gecko/ 
h http://www6.in.tum.de/~chengch/gavs/ 

1 1 In our implementation, the PC starts from rather than 1 ; which is different from the formulation in IM and PISEM. 



Algorithm 4: LocalTimingModification 



(AN) 



Data: Original PISEM S = (A,Af,T), synthesized IM S = 
Result: For each a<k and cr^oj, their execution interval [a 

For convenience, use (X in S) to represent the retrieved value X from PISEM S. 
begin 

for (r%[A m =i...n A \pc% tmiom ,pc« <mup )] in a~ of A { do 

let a<s, ck^2.j, /9|_aj // Create a new variable for the constraint system 
/* Type A constraint: causalities within the piocess */ constraint s.&dd{oi!k = /3|_fj) 
constraints. add(a^j = (a^j!B5)) 
constramts.add(/3" = ((3^a^inS)) 
constraints.add((3>± — q« > VFC£T(cth)) 
constraints. add(/3^ j — «[f J > WCET(a^j )) 

/* Type B constraint: causalities crossing different processes */ 
for era [A m= i... nA [pc», mioro ,pca imup )] in a~ of A4 do 
for ad[/\ m =i...n A [pcd,m lmu ,pcd,m up )} in a] o/Aj do 
if d € N and not exists as. € ctT vv/zere I - I = e? then 

,1 J L U J 



9 
10 



11 



12 



if pcd,j up < P c f,ji ou , then constraint s.add((/3d in 5) < a«) 
if pcd,j la1ll > pcf,j up then constraints. add((a<j in S) > /3|) 



if (T» := send(pre, ind, n, dest, v, c) A pcd,j lovJ > v c % 
J constraints. add((a d in 5) > £3. + WCMTT(n, ind)) 



3up 



then 



else 



if P c d,j up <P c -,ji„u, t nen constraints. add(/3d < a±) 
if P c d,ji ow > P c -,j up then constraint s.add(a c j > /3a) 
if crs. := send(pre, ind, n, dest, v, c) A pcd,j lolu > pcf,j v 
[_ consirainis.add^ > /3§ + WCMTT(n, ind)) 



I* Type C constraint: conservative data dependency constraints */ 

for o-<*[A m =x... nA \pc£ !miow ,pc* imup )] in a~ of Ai do 



then 



for a d [A m =i...n A (pcd,r 



, PCd.r 



in o-j of Aj do 



if ad '■= send(pre, ind, n, dest, v,c) A a<± reads variable c A pcdj up < pc° t 
J constraints. add((/3 d in 5) + WCMTT(n, ind) < a«) 

solve constraints using (linear) constraint solvers. 



then 



generates an appropriate PC-precondition for each action to satisfy the specification. In Figure [7] bold numbers (e.g., 
(0000)) indicate the synthesized result. The time line of the execution (the synthesized result) is explained as follows: 

1) Process A reads the input, sends MsgSend(m), and waits. 

2) Process B first waits until it is allowed to execute (RecvMsg(m)). Then it performs a conditional send MsgSend(req) 
and waits. 

3) Process A performs RecvMsg(reg), following a conditional send MsgSend(rsp). 

4) Process B performs conditional assignment, which assigns the value of rsp to m, if m v is empty. 

We continue the case study by stating assumptions over hardware and timing; these can be specified in GECKO as 
properties of the model. 

1) Process A and B are running on two Texas Instrument LM3S8962 development board^j under FreeRTOSf^] (a 
real-time operating system), and messages are communicating over a CAN bus. 

2) For each existing or FT action, its WCET on the hardware is 1ms. 

3) For all messages communicating using the network, the WCMTT is 3ms. 

We apply the LTM algorithm, such that we can generate timing constraints on dedicated hardware; these timing 
constraints will be translated to executable C code (based on FreeRTOS). Figure [8] is used to assist the explanation of 
LTM, where variables used in the linear constraint solver are specified as follows: 
• as: release time for action "RecvMsg(reg)" in process A. 

[z http://www.luminarymicro.com/products/LM3S8962.html 
13 http://www.freertos.org/ 



PCa 
[000 00] 



Process A 
Period = 100ms 
m e {T, F} 

req, rsp £ {T 1 F}, req v , rsp v £ {T,±} 



InputRead(m) [0000, 0001); {0000} 

MsgSend(m) [0000, 0001); {0000} 
RecvMsg(re 9 )[0000, 1100); {0011} 
if (re*, rsp := m [0000, 1100); {0011} 
if(reg„ ^±) MsgSend(rsp) [0000, 1100);{0011} 
PrintOut(m); [0011, 1100); {0100} 



1 [001 00] 
| [001 01] 
I [001 10] 

1 [001 11] 

2 [010 00] 

3 [Oil 00] 

4 [100 00] End of Period (with variable reset) 



PC B 
0[00 00] 

\ [00 01] 

1 [00 10] 

I [00 11] 

1 [01 00] 

2 [10 00] 



Process B 
Period = 100ms 

m e {T,F},m v e {T,±} 

req, rsp £ {T, i* 1 }, reg„, rsp t , £ {T, ±} 



RecvMsg(m) [00101, 01100); {00101} 

if (m„ =J_) reg := T [00101, 10000); {00101} 

if (m„ =_L) MsgSend(reg) [00101, 10000);{00101} 

if (m„ =-L) m = rsp [00101, 10000);{01000} 

PrintOut(m); [00101, 10000);(01100} 



3 [11 00] End of Period (with variable reset) 
Figure 7. A concept illustration for the control choices in the generated game. 
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Figure 8. An illustration for applying LTM for the example in sec. |VIII-A| and the corresponding linear constraints. 



• a§_: release time for action " if (req v t^_L) rsp := m" (and similarly, the deadline for "RecvMsg(reg)") in 
process A . 

• ar: release time for action "if(Yeg,j ^_L) MsgSend(rsp)" in process A. 

• 61: release time for action "if (m v =_L) req :— T" in process B. 

• 02: release time for action "if (m v =_L) MsgSend(req)" in process B. 

• 63: release time for action "if (m v =_L) m = rsp" in process B. 

As in process A, there exists a time interval [40, 99) between two existing actions MsgSend(m) and PrintOut(77i), 

the LTM algorithm will prefer to utilize this interval than splitting [0, 40), as using [40, 99) generates the least 

modification on the scheduling. The generated linear constraint system is also shown in Figure [8] An satisfying instance 

for (05,06,07,61,62,63) could be (72,77,82,62,67,87); instructions concerning the release time and the deadline 

4 4 4444 
for the generated fault-tolerant model can be annotated based on this. 

B. Another Example 

For the second example, the user selects an inappropriate set of FT mechanism^] Compared to Figure [6] in process 
A an equality constraint "if(reg^ =-L)" is used, instead of "if (req v ^JL)". In this way, the combined effect of FT 
mechanisms in Example B changes dramatically from that of Example A: 

• When B does not receive m from A, it sends a request command. 

• When A receives a request message, it does not send the response; this violates the original intention of the 
designer. 



This is originally a design mistake when we specify our FT mechanism patterns; however interesting results are generated. 
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(c) Constraint system by LTM 
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(d) Results of synthesized timing constraints 



Figure 9. Screenshots of GECKO when executing the example in sec. |VIII-A| 
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Figure 10. Concept illustration of the overall approach for fault-tolerant synthesis; IM+FT means that an IM model is equipped with FT mechanisms. 



Surprisingly, Gecko reports a positive result with an interesting sequence! For all FT actions in process A, they 
should be executed with the procondition of PC& equal to 0000, meaning that FT mechanisms in A are executed before 
RecvMsg(m) in B starts. In this way, A always sends the message MsgSend(rsp) containing the value of m, and as 
at most one message loss exists in one period, the specification is satisfied. 

C. Discussion 

Concerning the running time of the above two examples, the searching engine (based on forward searching + BDD for 
intermediate image storing) is able to report the result in 3 seconds, while constraint solving is also relatively fast (within 
1 second). Our engine offers a translation scheme to dump the BDD to mechanisms in textual form; this process occupies 
most of the execution time. Note that the NP-completeness result does not bring huge benefits, as another exponential 
blow-up caused by the translation from variables to states is also unavoidable: this is the reason why currently we use 
a forward search algorithm combining with BDDs in the implementation. 

Nevertheless, this does not means that FT synthesis in practice is not possible; our argument is as follows: 

1) We have indicated that this method is applicable for small examples (similar to the test case in the paper). 

2) To fight with complexity we consider it important to respect the compositional (layered) approach used in the 
design of embedded systems: once when a system have been refined to several subsystems, it is more likely for 
our approach to be applicable. 

IX. Related Work 

Verification and synthesis of fault tolerance is an active field Q2L EL DSL EL EL ED, 0, 0. Among all 
existing works, we find that the work closest to ours is by Kulkarni et.al. |fl6l . Here we summarize the differences in 
three aspects. 

1) (Problem) As we are interested in real-time embedded systems, our starting model resembles existing formulations 
used in the real-time community, where time is explicitly stated in the model. Their work is more closely to protocol 
synthesis and the starting model is based on (a composition of) FSMs. 

2) (Approach) As our original intention is to facilitate the pattern selection and tuning process, our approach does 
not seek for the synthesis of complete FT mechanisms and can be naturally connected to games (having a set of 
predefined moves). Contrarily, their results focus on synthesizing complete FT mechanisms, for example voting 
machines or mechanisms for Byzantine generals' problem. 

3) (Algorithm) To apply game-based approach for embedded systems, our algorithms includes the game translation 
(timing abstraction) and constraint solving (for implementability). In addition, our game formulation enables us 
to connect and modify existing and rich results in algorithmic game solving: for instance, we reuse the idea of 
witness in for distributed games, and it is likely to establish connections between incomplete methods for 
distributed games and algorithms for games of imperfect information J8). 

A recent work by Girault et.al. iTPJl follows similar methodologies (i.e., on protocol level FSMs) to [16| and performs 
discrete controller synthesis for fault-tolerance; the difference between our work and theirs follows the argument above. 

Lastly, we would like to comment on the application of algorithmic games. Several important work for game analysis 
or LTL synthesis can be found from Bloem and Jobstmann et.al. (the program repair framework IT5l ). Henzinger and 
Chatterjee et.al. (Alpaga and the interface synthesis 0, ifTOl ). or David and Larson et.al. (Uppaal TIGA [3|). One 
important distinction is that due to our system modeling, we naturally start from a problem of solving distributed games 
and need to fight with undecidability immediately, while the above works are all based on a non-distributed setting. 

X. Concluding Remarks 



This paper presents a comprehensive approach (see Figure 10 for concept illustration) for the augmenting of fault- 
tolerance for real-time distributed systems under a game-theoretic framework. We use simple yet close-to-reality models 
(PISEM) as a starting point of FT synthesis, translate PISEM to distributed games with safe abstractions, perform 
game solving and later implementability analysis. The above flow is experimented in a prototype, enabling us to utilize 
model-based development framework to perform FT synthesis. These mechanisms may have interesting applications in 



distributed process control and robotics. To validate our approach, we plan to increase the maturity of our prototype 
system and study new algorithms for performance gains. 
Acknowledgments.: The first author is supported by the DFG Graduiertenkolleg 1480 (PUMA). 
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Appendix 

A. The Need of Reestimating the WCMTT in CAN Buses when FT messages are Introduced 

To have an understanding whether newly introduced FT messages can change the existing networking behavior is 
both hardware and configuration dependent. In this section, we only describe the behavior when FT messages are 
introduced in a Control Area Network (CAN bus), which is widely used in automotive and automation domains. Here 
we give configuration settings (conditions) such that newly introduced messages do not influence the existing networking 
behavior. For details concerning the timing analysis of CAN, we refer readers to ll2D . 171 . 

Proposition 1: Given an ideal CAN bus with message priority from 1 to k, when the three conditions are satisfied: 

1) No message with priority k is not used in the existing network. 

2) The predefined size of the message for priority k is larger than all messages with priority 1 to k — 1, 

3) All FT messages are having priority smaller or equal to k, and the size is less than the message size stated in (2). 
When the WCMTT is derived using the analysis in ETI . concerning the WCMTT of all messages with priority 1 to k, 
it is indifferent to the newly introduced messages. 

Proof: (Outline) Based on the algorithm in lETt for a message with priority i G {1, . . . , k}, its timing behavior 
only changes with two factors: 

• (a) The blocking time caused by a message with lower priority j > i changes: when a message with lower priority 
changes to a bigger message size, the blocking time increases. 

• (b) The interference from messages with higher priority j < i. 
We proceed the argument as follows. 

• For timing changes due to (b), as FT messages are all with lower priorities (based on condition 3), they do not 
create or increase interferences with this type. 

• For timing changes due to (a), we separate two two cases: 

- As the size of all FT messages are smaller or equal than the message size specified in (2), then the timing 
behavior for messages with priority 1 to k — 1 do not change. 

- Lastly, although the message with priority k can change as it can now be blocked by a lower priority message, 
such message does not exist based on condition (1). 

■ 

By the above information, in our framework we may assume that all messages transmitted in a CAN bus are with 
lowest priority k + 1, and then perform a simple timing analysis at Step A. 2. 2 before creating the game; in this way, 
the problem is Step A. 2.2 ([Problem 2]) is safe to neglect. 

B. Algorithm Modification for Interleaving of Local Games 



Algorithm 5: PositionalDistributedStrategy ControlInLocalGamelnterleaving BoundedSAT O 

Data: Distributed game graph Q = (Vq W Vi,£), set of initial states Vi n it, set of target states V goa i, the unrolling 
depth d 

Result: Output: whether a distributed positional strategy exists to reach V goa i from Vi n u 
begin 

let clauseList := getEmptyListf ) /* Store all clauses for SAT solvers */ 
execute STEP 1 to STEP 5 mentioned in PositionalDistributedStrategy_BoundedSAT_0 
/* STEP 6: Impact of control edge selection */ 
for local control transition e — {xi,x'^} € Ei,Xi G Vo 4 do 
for v = (i>i, . . . , v m ) <E Vo W Vi where Xi — Vi do 
for j = 1, . . . , d — 1 do 

clauseList.add([(e) =>■ ((wi, . . . , v it v i+ i, . . . ,v m )j => 
[_ ((vi,. . . . . -,v m ) j+1 V ... V (vi, . . . , x\, v i+1 , . . .,v m ) d ))]) 

I* STEP 7: Impact of environment vertex */ 

for environment vertex v = . . . , v m ) € Vi do 

let the set of successors be v i> f° r J = 1 , . . . , c? — 1 do 
L clauseListadd([(«) i => U\ t (( v i) 3 +i V ... V, (vi) d ))J) 

I* STEP 8: Invoke the SAT solver: return true when satisfiable */ 
return invokeSATsolver(clauseList) 



(Remark) Compared to Mohalik and Walukiwitz's formulation, as in this formulation, only one subgame in a control 
position can proceed a move, we do not need to create constraints considering all possible combinations in STEP 6, 



which is required in the algorithm PositionalDistributedStrategyBoundedSATO (sec. VI i 



C. Brief Instructions on Executing Examples in Gecko 

Here we illustrate how FT synthesis is done in our prototype tool-chain using the example in sec. |VM-A| first we 
perform model transformation and generate a new model which equips FT mechanisms. Then executable code can be 
generated based on performing code-generation over the specified model (optional). Once when the Gecko Eclipse 
add-on is installed (see our website for instructions), proceed with the following steps: 

• The model (F01_FT_Synthesis_Correct . xmi) for sec. |VIII-"A| contains the fault model, the hardware used 
in the system, and pre-inserted FT mechanism blocks, but their timing information is unknown. 

• Right click on the selected model under synthesis, choose "Verification" -> 

"Gecko: Model Transformation and Analysis". A pop-up window similar to fig. [9^ is available. 

• In the General tab, choose Symbolic FT synthesis using 
algorithmic game theory. 

• In the Platform Analysis tab, set up the default actor WCET and network WCMTT to be 1 and 3. 

• In the Output tab, select the newly generated output file. 

• Press "Finish". Results of intermediate steps are shown in the console, including FT mechanisms as interleaving 
models (fig. [9Jd), constraints derived from LTM (fig. [9}:), and results of timing (fig. [9]i) after executing the constraint 
solver. 

• In fig. [9J3, the mechanism dumped from the engine specifies the action 

"xi{req v ^_L) MsgSend(rsp)": note that this action implicitly implies that when req v =_L, a null-op which 
only updates the program counter should be executed; this is captured by our synthesis framework. 

• In fig. [9}i, the total execution time is roughly 18s because the engine dumps the result back to mechanisms in 
textual form, which consumes huge amount of time: executing the game and performing constraint solving take 
only a small portion of the total time. 

• When the model is generated, users can again right click on the newly generated model, and select Code Generation 
in the tab General: the code generator then combines the model description and software templates for dedicated 
hardware and OS to create executable C code. 



